6 research outputs found

    Contributions to Deep Learning Models

    Full text link
    [EN] Deep Learning is a new area of Machine Learning research which aims to create computational models that learn several representations of the data using deep architectures. These methods have become very popular over the last few years due to the remarkable results obtained in speech recognition, visual object recognition, object detection, natural language processing, etc. The goal of this thesis is to present some contributions to the Deep Learning framework, particularly focused on computer vision problems dealing with images. These contributions can be summarized in two novel methods proposed: a new regularization technique for Restricted Boltzmann Machines called Mask Selective Regularization (MSR), and a powerful discriminative network called Local Deep Neural Network (Local-DNN). On the one hand, the MSR method is based on taking advantage of the benefits of the L2 and the L1 regularizations techniques. Both regularizations are applied dynamically on the parameters of the RBM according to the state of the model during training and the topology of the input space. On the other hand, the Local-DNN model is based on two key concepts: local features and deep architectures. Similar to the convolutional networks, the Local-DNN model learns from local regions in the input image using a deep neural network. The network aims to classify each local feature according to the label of the sample to which it belongs, and all of these local contributions are taken into account during testing using a simple voting scheme. The methods proposed throughout the thesis have been evaluated in several experiments using various image datasets. The results obtained show the great performance of these approaches, particularly on gender recognition using face images, where the Local-DNN improves other state-of-the-art results.[ES] El Aprendizaje Profundo (Deep Learning en inglés) es una nueva área dentro del campo del Aprendizaje Automático que pretende crear modelos computacionales que aprendan varias representaciones de los datos utilizando arquitecturas profundas. Este tipo de métodos ha ganado mucha popularidad durante los últimos años debido a los impresionantes resultados obtenidos en diferentes tareas como el reconocimiento automático del habla, el reconocimiento y la detección automática de objetos, el procesamiento de lenguajes naturales, etc. El principal objetivo de esta tesis es aportar una serie de contribuciones realizadas dentro del marco del Aprendizaje Profundo, particularmente enfocadas a problemas relacionados con la visión por computador. Estas contribuciones se resumen en dos novedosos métodos: una nueva técnica de regularización para Restricted Boltzmann Machines llamada Mask Selective Regularization (MSR), y una potente red neuronal discriminativa llamada Local Deep Neural Network (Local-DNN). Por una lado, el método MSR se basa en aprovechar las ventajas de las técnicas de regularización clásicas basadas en las normas L2 y L1. Ambas regularizaciones se aplican sobre los parámetros de la RBM teniendo en cuenta el estado del modelo durante el entrenamiento y la topología de los datos de entrada. Por otro lado, El modelo Local-DNN se basa en dos conceptos fundamentales: características locales y arquitecturas profundas. De forma similar a las redes convolucionales, Local-DNN restringe el aprendizaje a regiones locales de la imagen de entrada. La red neuronal pretende clasificar cada característica local con la etiqueta de la imagen a la que pertenece, y, finalmente, todas estas contribuciones se tienen en cuenta utilizando un sencillo sistema de votación durante la predicción. Los métodos propuestos a lo largo de la tesis han sido ampliamente evaluados en varios experimentos utilizando distintas bases de datos, principalmente en problemas de visión por computador. Los resultados obtenidos muestran el buen funcionamiento de dichos métodos, y sirven para validar las estrategias planteadas. Entre ellos, destacan los resultados obtenidos aplicando el modelo Local-DNN al problema del reconocimiento de género utilizando imágenes faciales, donde se han mejorado los resultados publicados del estado del arte.[CA] L'Aprenentatge Profund (Deep Learning en anglès) és una nova àrea dins el camp de l'Aprenentatge Automàtic que pretén crear models computacionals que aprenguen diverses representacions de les dades utilitzant arquitectures profundes. Aquest tipus de mètodes ha guanyat molta popularitat durant els últims anys a causa dels impressionants resultats obtinguts en diverses tasques com el reconeixement automàtic de la parla, el reconeixement i la detecció automàtica d'objectes, el processament de llenguatges naturals, etc. El principal objectiu d'aquesta tesi és aportar una sèrie de contribucions realitzades dins del marc de l'Aprenentatge Profund, particularment enfocades a problemes relacionats amb la visió per computador. Aquestes contribucions es resumeixen en dos nous mètodes: una nova tècnica de regularització per Restricted Boltzmann Machines anomenada Mask Selective Regularization (MSR), i una potent xarxa neuronal discriminativa anomenada Local Deep Neural Network ( Local-DNN). D'una banda, el mètode MSR es basa en aprofitar els avantatges de les tècniques de regularització clàssiques basades en les normes L2 i L1. Les dues regularitzacions s'apliquen sobre els paràmetres de la RBM tenint en compte l'estat del model durant l'entrenament i la topologia de les dades d'entrada. D'altra banda, el model Local-DNN es basa en dos conceptes fonamentals: característiques locals i arquitectures profundes. De forma similar a les xarxes convolucionals, Local-DNN restringeix l'aprenentatge a regions locals de la imatge d'entrada. La xarxa neuronal pretén classificar cada característica local amb l'etiqueta de la imatge a la qual pertany, i, finalment, totes aquestes contribucions es fusionen durant la predicció utilitzant un senzill sistema de votació. Els mètodes proposats al llarg de la tesi han estat àmpliament avaluats en diversos experiments utilitzant diferents bases de dades, principalment en problemes de visió per computador. Els resultats obtinguts mostren el bon funcionament d'aquests mètodes, i serveixen per validar les estratègies plantejades. Entre d'ells, destaquen els resultats obtinguts aplicant el model Local-DNN al problema del reconeixement de gènere utilitzant imatges facials, on s'han millorat els resultats publicats de l'estat de l'art.Mansanet Sandín, J. (2016). Contributions to Deep Learning Models [Tesis doctoral no publicada]. Universitat Politècnica de València. https://doi.org/10.4995/Thesis/10251/61296TESI

    Local Deep Neural Networks for gender recognition

    Full text link
    Deep learning methods are able to automatically discover better representations of the data to improve the performance of the classifiers. However, in computer vision tasks, such as the gender recognition problem, sometimes it is difficult to directly learn from the entire image. In this work we propose a new model called Local Deep Neural Network (Local-DNN), which is based on two key concepts: local features and deep architectures. The model learns from small overlapping regions in the visual field using discriminative feed forward networks with several layers. We evaluate our approach on two well-known gender benchmarks, showing that our Local-DNN outperforms other deep learning methods also evaluated and obtains state-of-the-art results in both benchmarks. (C) 2015 Elsevier B.V. All rights reserved.This work was financially supported by the Ministerio de Ciencia e Innovacin (Spain), Plan Nacional de I-D+i, TEC2009-09146, and the FPI grant BES-2010-032945.Mansanet Sandín, J.; Albiol Colomer, A.; Paredes Palacios, R. (2016). Local Deep Neural Networks for gender recognition. Pattern Recognition Letters. 70:80-86. https://doi.org/10.1016/j.patrec.2015.11.015S80867

    Restricted Boltzmann Machines for Gender Classification

    Full text link
    This paper deals with automatic feature learning using a generative model called Restricted Boltzmann Machine (RBM) for the problem of gender recognition in face images. The RBM is presented together with some practical learning tricks to improve the learning capabilities and speedup the training process. The performance of the features obtained is compared against several linear methods using the same dataset and the same evaluation protocol. The results show a classification accuracy improvement compared with classical linear projection methods. Moreover, in order to increase even more the classification accuracy, we have run some experiments where an SVM is fed with the non-linear mapping obtained by the RBM in a tandem configuration.Mansanet Sandin, J.; Albiol Colomer, A.; Paredes Palacios, R.; Villegas, M.; Albiol Colomer, AJ. (2014). Restricted Boltzmann Machines for Gender Classification. Lecture Notes in Computer Science. 8814:274-281. doi:10.1007/978-3-319-11758-4_30S2742818814Bengio, Y., Courville, A., Vincent, P.: Representation learning: A review and new perspectives. IEEE Trans. on PAMI 35(8), 1798–1828 (2013)Bressan, M., Vitrià, J.: Nonparametric discriminant analysis and nearest neighbor classification. Pattern Recognition Letters 24(15), 2743–2749 (2003)Buchala, S., et al.: Dimensionality reduction of face images for gender classification. In: Proceedings of the Intelligent Systems, vol. 1, pp. 88–93 (2004)Cai, D., He, X., Hu, Y., Han, J., Huang, T.: Learning a spatially smooth subspace for face recognition. In: CVPR, pp. 1–7 (2007)Courville, A., Bergstra, J., Bengio, Y.: Unsupervised models of images by spike-and-slab rbms. In: ICML, pp. 1145–1152 (2011)Huang, G.B., et al.: Labeled faces in the wild: A database for studying face recognition in unconstrained environments. Technical Report 07–49, Univ. of Massachusetts (October 2007)Schmah, T., et al.: Generative versus discriminative training of rbms for classification of fmri images. In: NIPS, pp. 1409–1416 (2008)Graf, A.B.A., Wichmann, F.A.: Gender classification of human faces. In: Bülthoff, H.H., Lee, S.-W., Poggio, T.A., Wallraven, C. (eds.) BMCV 2002. LNCS, vol. 2525, pp. 491–500. Springer, Heidelberg (2002)He, X., Niyogi, P.: Locality preserving projections. In: NIPS (2004)Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)Hinton, G.E.: A practical guide to training restricted boltzmann machines. Technical report, University of Toronto (2010)Hinton, G.E., Salakhutdinov, R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)Moghaddam, B., Yang, M.-H.: Learning gender with support faces. IEEE Trans. on PAMI 24(5), 707–711 (2002)Nair, V., Hinton, G.E.: 3d object recognition with deep belief nets. In: NIPS, pp. 1339–1347 (2009)Salakhutdinov, R., Mnih, A., Hinton, G.: Restricted boltzmann machines for collaborative filtering. In: ICML, pp. 791–798 (2007)Shan, C.: Learning local binary patterns for gender classification on real-world face images. Pattern Recognition Letters 33(4), 431–437 (2012)Shobeirinejad, A., Gao, Y.: Gender classification using interlaced derivative patterns. In: ICPR, pp. 1509–1512 (2010)Villegas, M., Paredes, R.: Dimensionality reduction by minimizing nearest-neighbor classification error. Pattern Recognition Letters 32(4), 633–639 (2011

    Estimating Point of Regard with a Consumer Camera at a Distance

    Full text link
    In this work, we have studied the viability of a novel technique to estimate the POR that only requires video feed from a consumer camera. The system can work under uncontrolled light conditions and does not require any complex hardware setup. To that end we propose a system that uses PCA feature extraction from the eyes region followed by non-linear regression. We evaluated three state of the art non-linear regression algorithms. In the study, we also compared the performance using a high quality webcam versus a Kinect sensor. We found, that despite the relatively low quality of the Kinect images it achieves similar performance compared to the high quality camera. These results show that the proposed approach could be extended to estimate POR in a completely non-intrusive way.Mansanet Sandin, J.; Albiol Colomer, A.; Paredes Palacios, R.; Mossi García, JM.; Albiol Colomer, AJ. (2013). Estimating Point of Regard with a Consumer Camera at a Distance. En Pattern Recognition and Image Analysis. Springer Verlag. 7887:881-888. doi:10.1007/978-3-642-38628-2_104S8818887887Baluja, S., Pomerleau, D.: Non-intrusive gaze tracking using artificial neural networks. Technical report (1994)Breiman, L.: Random forests. Machine Learning (2001)Logitech HD Webcam C525, http://www.logitech.com/es-es/webcam-communications/webcams/hd-webcam-c525Chang, C.-C., Lin, C.-J.: LIBSVM: A library for support vector machines. ACM TIST (2011), Software, http://www.csie.ntu.edu.tw/~cjlin/libsvmDrucker, H., Burges, C., Kaufman, L., Smola, A., Vapnik, V.: Support vector regression machines (1996)Hansen, D.W., Ji, Q. In: the eye of the beholder: A survey of models for eyes and gaze. IEEE Transactions on PAMI (2010)Ji, Q., Yang, X.: Real-time eye, gaze, and face pose tracking for monitoring driver vigilance. Real-Time Imaging (2002)Kalman, R.E.: A new approach to linear filtering and prediction problems. Transactions of the ASME–Journal of Basic Engineering (1960)Microsoft Kinect, http://www.microsoft.com/en-us/kinectforwindowsTimmerman, M.E.: Principal component analysis (2nd ed.). i. t. jolliffe. Journal of the American Statistical Association (2003)Morimoto, C.H., Mimica, M.R.M.: Eye gaze tracking techniques for interactive applications. Comput. Vis. Image Underst. (2005)Pirri, F., Pizzoli, M., Rudi, A.: A general method for the point of regard estimation in 3d space. In: Proceedings of the IEEE Conference on CVPR (2011)Reale, M.J., Canavan, S., Yin, L., Hu, K., Hung, T.: A multi-gesture interaction system using a 3-d iris disk model for gaze estimation and an active appearance model for 3-d hand pointing. IEEE Transactions on Multimedia (2011)Saragih, J.M., Lucey, S., Cohn, J.F.: Face alignment through subspace constrained mean-shifts. In: International Conference of Computer Vision, ICCV (2009)Kar-Han, T., Kriegman, D.J., Ahuja, N.: Appearance-based eye gaze estimation. In: Applications of Computer Vision (2002)Takemura, K., Kohashi, Y., Suenaga, T., Takamatsu, J., Ogasawara, T.: Estimating 3d point-of-regard and visualizing gaze trajectories under natural head movements. In: Symposium on Eye-Tracking Research and Applications (2010)Villanueva, A., Cabeza, R., Porta, S.: Eye tracking: Pupil orientation geometrical modeling. Image and Vision Computing (2006)Williams, O., Blake, A., Cipolla, R.: Sparse and semi-supervised visual mapping with the s3gp. In: IEEE Computer Society Conference on CVPR (2006

    Técnicas de regresión para la estimación de la localización de la mirada

    Full text link
    [ES] El trabajo trata sobre la implementación de un sistema para la estimación de la mirada de una persona sobre una pantalla. Esto se conoce en la literatura como Point-Of-Regard. Para ello se ha realizado un estudio comparativo previo, bajo unas condiciones controladas, para saber qué tipo de algoritmo se apdapta mejor al sistema creado. Para demostrar el funcionamiento se ha creado una aplicación en tiempo real que estima el POR de la persona mantenindo fija la cabeza. Para ello hemos utilizado cámaras de consumo, obteniendo un sistema económico que permite la estimación de la mirada utilizando una fase inicial de entrenamiento individual para cada persona.[EN] The paper describes the implementation of a system for estimating a person's gaze on a screen. This is known in the literature as Point-Of-Regard. First of all, we made a preliminary comparative study, under controlled conditions, to know what kind of algorithm works better with the system created. To show the operation of the system we have created a real-time application that estimates the POR of the person keeping the head fixed. We have used consumer cameras, obtaining an economic system that allows the estimation of the gaze using an initial phase of training for each person.Mansanet Sandín, J. (2012). Técnicas de regresión para la estimación de la localización de la mirada. http://hdl.handle.net/10251/18034Archivo delegad

    Mask selective regularization for restricted Boltzmann machines

    Full text link
    In the present work, we propose to deal with two important issues regarding to the RBM's learning capabilities. First, the topology of the input space, and second, the sparseness of the RBM obtained. One problem of RBMs is that they do not take advantage of the topology of the input space. In order to alleviate this lack, we propose to use a surrogate of the mutual information of the input representation space to build a set of binary masks. This approach is general and not only applicable to images, thus it can be extended to other layers in the standard layer-by-layer unsupervised learning. On the other hand, we propose a selective application of two different regularization terms, L-1 and L-2, in order to ensure the sparseness of the representation and the generalization capabilities. Additionally, another interesting capability of our approach is the adaptation of the topology of the network during the learning phase by means of selecting the best set of binary masks that fit the current weights configuration. The performance of these new ideas is assessed with a set of experiments on different well-known corpus. (C) 2015 Elsevier B.V. All rights reserved.This work was financially supported by the Ministerio de Ciencia e Innovacion (Spain), Plan Nacional de I+D+i, Grant TEC2009-09146, and the FPI Grant BES-2010-032945.Mansanet Sandín, J.; Albiol Colomer, A.; Paredes Palacios, R.; Albiol Colomer, AJ. (2015). Mask selective regularization for restricted Boltzmann machines. Neurocomputing. 165:375-383. https://doi.org/10.1016/j.neucom.2015.03.026S37538316
    corecore